Introduction

Advanced Geospatial Data Processing for Social Scientists

Dennis Abel & Stefan Jünger

2025-04-28

The goal of this course

This course will teach you how to exploit R and apply its geospatial techniques in a social science context.

By the end of this course, you should…

  • Be comfortable with using raster data in R
  • Including importing and wrangling single raster layers as well as raster stacks and cubes
  • Have a fundamental understanding of spatial geometries and how these are manipulated in R
  • Be able to create publication-ready maps of raster layers and stacks
  • Not be afraid to explore access to remote sensing APIs for your own research questions

Illustration by Allison Horst

We are (necessarily) selective

There’s a multitude of spatial R packages

  • We cannot cover all of them
  • And we cannot cover all functions
  • You may have used some we are not familiar with

We will show the use of packages we exploit in practice

  • There’s always another way of doing things in R
  • Don’t hesitate to bring up your solutions

You can’t learn everything at once, but you also don’t have to!

Prerequisites for this course

  • Good knowledge of R, its syntax, and internal logic
  • Affinity for using script-based languages
  • Knowledge of fundamentals of geospatial data wrangling and analysis
  • Don’t be scared to wrangle data with complex structures
  • Working versions of R (and Rstudio) on your computer

About us (Dennis)

  • Postdoctoral Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation
  • Ph.D. in Political Economy, University of Cologne
  • Research interests:
    • Quantitative methods, Geographic Information Systems (GIS)
    • Environmental attitudes and behavior
    • Public policy
    • Open source software

About us (Stefan)

  • Senior Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation
  • Ph.D. in Social Sciences, University of Cologne
  • Research interests:
    • Quantitative methods, Geographic Information Systems (GIS)
    • Social inequalities
    • Attitudes towards minorities
    • Environmental attitudes
    • Reproducible research

About us (Amelie)

  • Intern at the Survey Data Augmentation at the GESIS department Survey Data Curation
  • Undergraduate in Geography, University of Bonn
  • Study Interests:
    • Geographic Information Systems (GIS)
    • Intersectionality of geography and social sciences (e.g. Loss and Damage)
    • Biogeography
    • Climatology
    • Dendrochronology and -ecology

About you

  • What’s your name?
  • Where do you work/research?
  • What are you working on/researching?
  • What is your experience with R or other programming languages?
  • What is your experience with geospatial data?

Course schedule

Day Time Title
April 28 10:00-11:15 Introduction
April 28 11:15-11:30 Coffee Break
April 28 11:30-13:00 Raster data in R
April 28 13:00-14:00 Lunch Break
April 28 14:00-15:15 Raster data processing
April 28 15:15-15:30 Coffee Break
April 28 15:30-17:00 Graphical display of raster data in maps
April 29 10:00-11:15 Datacube processing I
April 29 11:15-11:30 Coffee Break
April 29 11:30-13:00 Datacube processing II & API access
April 29 13:00-14:00 Lunch Break
April 29 14:00-15:15 Data integration and linking (with survey data)
April 29 15:15-15:30 Coffee Break
April 29 15:30-17:00 Outlook and open session with own application

Now

Day Time Title
April 28 10:00-11:15 Introduction
April 28 11:15-11:30 Coffee Break
April 28 11:30-13:00 Raster data in R
April 28 13:00-14:00 Lunch Break
April 28 14:00-15:15 Raster data processing
April 28 15:15-15:30 Coffee Break
April 28 15:30-17:00 Graphical display of raster data in maps
April 29 10:00-11:15 Datacube processing I
April 29 11:15-11:30 Coffee Break
April 29 11:30-13:00 Datacube processing II & API access
April 29 13:00-14:00 Lunch Break
April 29 14:00-15:15 Data integration and linking (with survey data)
April 29 15:15-15:30 Coffee Break
April 29 15:30-17:00 Outlook and open session with own application

Geospatial data in social sciences

A growing interest in economics and the social sciences in geospatial and Earth observation data has led to a broad spectrum of publications in recent years. We have identified four major subject areas which have been addressed with EO data recently:

  • Environmental attitudes and behavior,
  • Economic development and inequality,
  • Conflict and migration,
  • Political behavior.

Geospatial data in social sciences

Hoffmann et al. 2022 analyse how the experience of climate anomalies and extremes influences environmental attitudes and vote intention in Europe

  • Data integration of 1. harmonized Eurobarometer data, 2. EU parliamentary electoral data, and 3. climatological data
  • Aggregation on regional levels (NUTS-2 and NUTS-3)
  • Climatological data from ERA5 reanalysis (CS3)
  • Calculations of temperature anomalies and extremes based on reference period (1971-2000)
  • Findings suggest effect of temperature anomalies (heat, “dry spell”) on environmental concern and vote intention

Geospatial data in social sciences

García-León et al. 2021 investigate historical and future economic impacts of recent heatwaves (2003, 2010, 2015, 2018) in Europe

  • Data integration of 1. heatwave data with 2. population data, and 3. worker productivity data, 4. economic accounts from Eurostat.
  • Aggregation on regional levels
  • Temperature data from ERA5 reanalysis
  • Calculations of heatwaves based on reference period (1981-2010)
  • Findings indicate total estimated damages attributed to heatwaves to 0.3-0.5% of European GDP with high spatial variation (GDP impacts beyond 1% in vulnerable regions)

Geospatial data in social sciences

Jean et al. 2016 show how nighttime maps can be utilized as estimates of household consumption and assets

  • Economic indicators are hard to measure in poorer countries - satellite imagery could be an alternative proxy for it
  • The authors integrate 1. survey data (World Bank’s Living Standards Measurement Surveys - LSMS; and Demographic and Health Surveys - DHS) with 2. nighttime light data in five African countries - Nigeria, Tanzania, Uganda, Malawi, and Rwanda
  • ML approach for image feature extraction in nighttime maps
  • Daytime satellite images from Google Static Maps, nighttime lights from US DMSP
  • Model can explain up to 75% of variation in local-level economic outcomes

Geospatial data in social sciences

Increased amount of available data

  • Quantitative and on a small spatial scale
  • Often open source and free access

Better tools

  • Standard software, such as R, can be used as Geographic Information System (GIS)

Geospatial data in this course I

In the folder called ./data, you can find (most of) the data files prepared for all the exercises and slides. The following data are included:

Geospatial data in this course II

  • WorldPop data are provided by a research group based at the University of Southampton, offering high-resolution, open-access data on population distribution and demographics. The datasets combine census information, satellite imagery, and statistical modeling to support research, policy-making, and humanitarian efforts worldwide.

  • Nighttime lights are derived from NASA’s Black Marble (Prefix VNP46A4_). The data has been accessed with Robert Marty’s and Gabriel Stefanini Vicente’s (2025) blackmarbler package.

  • ERA5 is provided by the Copernicus Climate Change Service (C3S) and offers high-resolution, global reanalysis data on atmospheric, land, and oceanic variables. The datasets combine observations with models to produce consistent climate information used for research, forecasting, and environmental monitoring.

Geospatial data in this course III

  • The International Social Survey Programme (ISSP) available at GESIS is provided by a cross-national collaboration of research organizations, delivering annual survey data on social attitudes and behaviors. The datasets cover a wide range of topics like work, family, environment, and social inequality, supporting comparative social science research worldwide.

  • The synthetic survey geocoordinates dataset is a simulated dataset comprising 2,000 spatial coordinates and one synthetic attribute related to migration status. It is designed for training and testing spatial analysis workflows without the constraints of real-world data privacy issues and was created using Stefan’s experimental geosynth R package.

Please make sure that if you reuse any of the provided data to cite the original data sources.

Refresher: What are geospatial data?

Data with a direct spatial reference

\(\rightarrow\) geo-coordinates x, y (and z)

Visualizing geometries in different styles depending on format:

  • Vector data (points, lines, polygons)
  • Raster data (grids)
  • Coordinate Reference System (CRS)

Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019

Refresher: Types of CRS

You may hear from geographic, geocentric, projected, or local CRS.

What’s the difference?

  • whether 2 dimensional (longitude, latitude) or 3 dimensional (+height) coordinates are used
  • the location of the coordinate system’s origin (center of earth or not)
  • projection on a flat surface (transformation of longitudes and latitudes to x and y coordinates)
  • location (the smaller, the more precise the projections)

In practice, what matters most is that two or more layers match when integrating them.

Refresher: Coordinate reference system (CRS)

  • CRS is a reference system to determine the precise location of points in space
  • GIS programs MUST know CRS for accurate processing, visualization, and analysis of data
  • CRS is based on the Geographic Coordinate System (GCS) + the Projected Coordinate System (PCS)

Refresher: Geographic Coordinate System (GCS)

Necessary to know where exactly on Earth’s surface data is located

  • GCS uses three-dimensional spherical surface to define locations based on datum and latitude and longitude lines
  • Datum: Mathematical model of the Earth that serves as reference point by defining size and shape of Earth
  • Local datum: Optimizes fit for particular location (like NAD83)
  • Geocentric datum: Optimizes fit for entire Earth (like Word Geodetic Survey 1984 - WGS84)
  • WGS84 is standard for GPS and many applications

Source: Caitlin Dempsey

Refresher: Projected Coordinate System (PCS)

Necessary to draw the data on a flat map

  • PCS represents Earth’ surface on a flat plane by mathematical transformations (projections)
  • Coordinate grid: Here we talk about x and y coordinates (= easting and northing)
  • Conversion of degrees of latitude and longitude into measurable units (like meters)

Different projection approaches. Left: Planar, middle: conic, right: cylindrical. Source

Refresher: Common PCS - UTM

Universal Transverse Mercator (UTM) is a global map projection which:

  • Projects globe onto a cylinder tangent to a central meridian
  • Divides it into 60 zones
  • Distortion is minimized within each zone
  • Provides high accuracy for small areas

Source

Refresher: Layers Must Match!

EPSG:3857

EPSG:3035

Source: Statistical Office of the European Union Eurostat (2018) / Jünger, 2019

Refresher: Documentation of CRS

Every geodata object requires a description of the CRS

  • GCS and datum
  • PCS
  • x and y units (like meters)
  • Domain (maximum allowable x and y values)
  • Resolution

Refresher: Old standard: PROJ.4 strings

This is how your information about the CRS are defined in a classic standard:

+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

Source: https://epsg.io/3035

(It’s nothing you would type by hand)

Refresher: WKT (“Well Known Text”)


PROJCS["ETRS89 / LAEA Europe",
    GEOGCS["ETRS89",
        DATUM["European_Terrestrial_Reference_System_1989",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6258"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4258"]],
    PROJECTION["Lambert_Azimuthal_Equal_Area"],
    PARAMETER["latitude_of_center",52],
    PARAMETER["longitude_of_center",10],
    PARAMETER["false_easting",4321000],
    PARAMETER["false_northing",3210000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AUTHORITY["EPSG","3035"]]

Source: https://epsg.io/3035

Refresher: EPSG Codes

Eventually, working with CRS in R will not be as challenging as it may seem since we don’t have to use PROJ.4 or WKT strings directly.

Most of the time, it’s enough to use so-called EPSG Codes (“European Petroleum Survey Group Geodesy”), a small digit sequence.

Refresher: What is GIS?

Most common understanding: Geographic Information Systems (GIS) as specific software to process geospatial data for

  • Visualization
  • Analysis
  • Interpretation

\(\rightarrow\) In our case, of course, it is R

But base R is limited when it comes to handling geospatial data

Packages in this course I 📦

We will use plenty of different packages during the course, but only a few are our main drivers (e.g., the terra package). Here’s the list of packages you may need for the exercises:

Packages in this course II 📦

Exercise 1_1: Package Installation

AI-assisted picture